arXiv:1508.02906v2 [q-bio.PE] 24 Aug 2016 


CONVERGENCE OF RANDOM WALKS TO BROWNIAN 
MOTION IN PHYLOGENETIC TREE-SPACE 
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Abstract. The set of all phylogenetic (or evolutionary) trees for a fixed set 
of species forms a manifold-stratified geodesic metric space known as Billera- 
Holmes-Vogtmann tree-space. In order to analyse samples of phylogenetic trees 
it is desirable to construct parametric distributions on this space, but this task 
is very challenging. One way to construct such distributions is to consider par¬ 
ticles undergoing Brownian motion in tree-space from a fixed starting point. 
The distribution of the particles after a given duration of time is analogous to a 
multivariate normal distribution in Euclidean space. Since these distributions 
cannot be worked with directly, we consider approximating them by suitably 
defined random walks in tree-space. We prove that as the number of steps 
tends to infinity and the step-size tends to zero, the distribution determined 
by the transition kernel of the random walk converges to that corresponding to 
Brownian motion. This result opens up the possibility of statistical modelling 
using distributions obtained from transition kernels of random walks and Brow¬ 
nian motion on tree-space. The methods developed here could be extended to 
establish similar results on other Euclidean complexes or manifold-stratified 
spaces. 


1. Introduction 

Phylogenetic trees represent evolutionary relationships between a set S of bi¬ 
ological objects, typically genetic sequences obtained from different present-day 
species. Internal vertices correspond to historic divergence events and the leaf ver¬ 
tices correspond to elements in S. The trees are edge-weighted, and these weights 
represent the degree of evolutionary divergence between species. The set of all phy¬ 
logenetic trees for a fixed set S forms a geodesic metric space usually referred to as 
Billera-Holmes-Vogtmann (BHV) tree-space [2]. Samples of trees arise from many 
biological analyses, for example by constructing a phylogenetic tree for each gene in 
a collection of different genes present in all species S. Medical images of branching 
structures in the body, such as blood vessels [13] and lung airways [5], can also be 
represented as phylogenetic trees, and samples of images have been analysed using 
methods based in BHV tree-space. 

To date, most methods for analysing data sets in BHV tree-space have involved 
least squares estimation using the geodesic metric: there are methods for comput¬ 
ing sample means [1, 10] and for fitting principal geodesics [5, 11], for example. A 
second approach involves maps samples of trees to a tangent space and performing 
analysis on that space [15]. On the other hand, development of fully probabilistic 
models is difficult due to the challenging problem of construction of useful proba¬ 
bility distributions on tree-space. Some recent work has focussed on kernel density 
estimation for non-parametric estimation of distributions on tree-space [14]. In this 
article we consider stochastic processes on tree-space, and use Brownian motion to 
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construct probability distributions. Consider particles undergoing Brownian mo¬ 
tion from some fixed point Sq in tree-space: if we run the Brownian motion for 
some fixed time duration to then the corresponding distribution of particles will be 
denoted B{xo,to). This distribution consists of a ‘bump’ of density on tree-space 
characterized by the location of the source xq and the time to which plays the role 
of a dispersion parameter. If the same process runs in Euclidean space then the 
corresponding distribution is the multivariate normal distribution with mean xo 
and variance-covariance matrix to x I. The density function for B{xo,to) for the 
space of unrooted phytogenies has been explicitly calculated in the cases of IS”! =4 
and 5 species [12]. However, for [S'! > 5 calculation of the density function becomes 
very difficult since it involves gluing together spherical harmonics and higher dimen¬ 
sional analogues to create eigenfunctions of the Laplacian on tree-space. Rather 
than attempting to calculate the density function explicitly, an alternative is to 
approximate Brownian motion using a suitably defined random walk. This has 
the additional benefit that random walks are easy to simulate, and it opens the 
possibility of using stochastic processes on tree-space more generally. 

In this paper we address two fundamental issues: (i) existence of Brownian 
motion as a well-defined Markov process on tree-space, and (ii) convergence of ran¬ 
dom walks to Brownian motion under a certain limit. BHV tree-space is a manifold 
stratified space: it consists of a collection of regions which are isomorphic to the 
positive orthant in K", with one such region for each labelled tree shape. These 
regions are glued along their faces in a way determined by the combinatorics of the 
tree shapes. Brownian motion on tree-space proceeds in the same way as for K” 
on the interior of each such region. At boundaries between regions, the particles 
undergoing Brownian motion move with equal probability into each of the neigh¬ 
bouring regions. In Lemma 3.4 we give an explicit construction of a probability 
measure defined on the set of continuous paths in tree-space which corresponds 
to this Markov process. Our main result is Theorem 4.7, in which we prove that 
suitably defined random walks converge to Brownian motion on BHV tree-space. 
The method of proof involves projecting sample paths in tree-space down onto a 
single positive orthant in K" and pulling-back into tree-space. The main technical 
challenge involved in proving Theorem 4.7 arises as particles undergoing Brownian 
motion can potentially cross the boundaries between the regions for different la¬ 
belled tree shapes an infinite number of times. Brownian sample paths can therefore 
traverse an infinite sequence of these regions, and some careful analysis is required 
to handle this difficulty. 

The existing literature on Brownian motion and random walks in manifold- 
stratified spaces is limited. Enriquez and Kifer [4] proved a version of Donsker’s 
theorem on graphs and showed that random walks along the edges of graphs con¬ 
verge to Brownian motion. The task of proving convergence of random walk to 
Brownian motion on graphs is a lower-dimensional analogue of proving conver¬ 
gence on tree-space: Brownian sample paths on a graph which hit a given vertex 
V traverse almost surely an infinite sequence Ei,E 2 ,. ■ ■ of edges, where each Ei is 
one of the edges {ei,..., 6^} incident at v, and this is analogous to the behaviour 
of sample paths in tree-space which hit the boundary between regions. Brin and 
Kifer [3] considered Brownian motion on 2-dimensional Euclidean complexes. They 
proved existence of Brownian motion on these complexes and established properties 
of the infinitesimal generator. However they did not consider random walks and 
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SO the majority of their results are unrelated to this paper. We make some further 
comments about the proof of existence in [3] in Section 3.3. Proof of convergence of 
random walks to Brownian motion on more general Euclidean complexes or mani¬ 
fold stratified spaces remains an open problem for which this paper and [4] can be 
seen as special cases. Stratihed spaces are playing an increasingly important role in 
mathematical biology and statistics [8, 9], particularly in the analysis of branching 
structures in biology. 

Our motivation for studying Brownian motion and other stochastic processes 
on tree-space arises from problems in statistical phylogenetics. For example, given 
the evolutionary tree relating a group of species (the species tree), individual genes 
evolve according to trees which are potentially different from the species tree (called 
gene trees). Population genetic models are used to relate the distribution of gene 
trees to the underlying species tree. Stochastic processes on tree-space might offer a 
more tractable, computationally efficient, means of modelling these relations. More 
generally, the distributions we construct in this paper could be used in to represent 
uncertainty in tree-space versions of generic statistical methods such as regression 
models or model-based clustering. 

The remainder of the paper is structured as follows. Section 2 contains back¬ 
ground material on the geometry of tree-space and Brownian motion in Euclidean 
space; we prove a result concerning intersections between Brownian sample paths 
and coordinate hyperplanes which the proof of convergence relies on. In Section 3 
we construct Brownian motion on tree-space as a probability measure on paths in 
tree-space and show this corresponds to a well-defined Markov process. In Section 4 
we define random walks on BHV tree-space and prove they converge to Brownian 
motion under a certain limit. When xg is not a binary tree, both existence of the 
stochastic processes and convergence of the random walk require special attention, 
and we do not give complete results for this case. This situation is discussed in 
Section 5. We give some concluding remarks in Section 6. 


2. Background 

2.1. The geometry of tree-space. The combinatorial and geometric structure of 
Billera-Holmes-Vogtmann tree-space was originally described in [2] . We give a brief 
description here of the essential elements we require, but more detail can be found 
in the original paper. In particular, we are more concerned with the combinatorial 
structure; geodesics and the corresponding metric structure are not required for 
most of our results. 

Tree-space Tn is the set of edge-weighted unrooted trees with N leaves labelled 
by a 1-to-l correspondence with a fixed set of species S = {1,2,..., N}. The 
space of rooted trees can easily be obtained from the unrooted space by adding an 
extra species 0 to S, so that 0 is by definition attached to the root vertex. The edge 
weights take values in K > 0 and are usually represented graphically as edge lengths. 
Each element of Tjv contains N edges which end in a leaf, referred to as pendant 
edges. Trees are called binary or (from the biological literature) fully-resolved if 
every vertex has degree 3 apart from the leaves, in which case the tree contains 
exactly 2N — 3 edges. A tree containing fewer internal edges is called unresolved, 
and will contain at least one vertex of degree 4 or more. For convenience we define 
N' = N — 3, the number of internal edges on a fully-resolved tree. Two trees Ti, T 2 
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are said to have the same topology if they are identical when the edge weights are 
ignored. There are (2iV — 5)!! fully-resolved topologies for N species. 

A split is a partition of S into two disjoint subsets. Given a tree x G Tn, cutting 
any edge on x partitions S in eactly this way, so edges are determined by the splits 
they induce. Any tree can therefore be regarded as a weighted set of splits which 
satisfy a certain compatibility relation. Arbitrary sets of splits do not represent 
trees: for example two the splits {1, 2}|{3,4,..., A^} and {1, 3}|{2,4,..., A^} are 
not compatible and cannot be simultaneously represented on a tree. Given a tree 
X and a split e we write ixie^) to denote the weight (or length) assigned to e in x, 
and dehne ixie) = 0 when e does not correspond to any edge in x. Tree topologies 
correspond to unweighted sets of compatible splits. We say a fully-resolved topology 
T resolves a topology r' when, as a set of splits, t' C t. Splits are pendant if they 
correspond to pendant edges, or internal otherwise. 

The weights associated with pendant edges will be ignored for the present. The 
set of all trees with some fixed fully-resolved topology r can then be put into corre¬ 
spondence with the interior of the positive orthant = [0,oo)^ by associating 
each internal split with a coordinate axis in . By taking a tree with topology r 
and shrinking an edge down to length zero, so that the corresonding split is removed 
from the tree, a point on the boundary of R^ is obtained. The boundary of R^ 
therefore corresponds to the set of trees with topologies resolved by r. Specifically, 
each codimension-fc face of R^ corresponds to a topology in which k internal splits 
have been removed from r. The orthant Or C Tn consists of all trees with topology 
T together with all trees with unresolved topologies r' C t, and so Or = R+ ■ 

Suppose a single edge is contracted down to length zero in a fully resolved tree 
with topology r, and let t' denote the resultant unresolved topology. In particular, 
r' will contain a single vertex of degree 4, and suppose the four subtrees attached to 
this vertex are denoted A, B, C, D. Then there are three topologies which resolve r' 
(including r): {{A, B), (C, D)), {{A, C) , {B, D)) and ((A, D), (B, C)). The process 
of continuously contracting an edge to give a degree-4 vertex, and then expanding 
out an alternative edge, is called nearest neighbour interchange (NNI). It follows 
that each codimension-1 face of an orthant Or corresponds to a set of trees which 
also lie on a codimension-1 face of exactly two other orthants related to r by NNI. 
Then, ignoring pendant edge weights, tree-space is the union IJ^ Or where the 
union is taken over fully resolved topologies, and triplets of orthants related by 
NNI are glued along codimension-1 faces. The codimension-fc faces of the orthants 
for 2 < k < N' also overlap and are glued in a similar way. However, it turns out 
that Brownian motion avoids these faces almost surely, and so the way they are 
glued together is much less important for our application than the codimension-1 
faces. When the weights of pendant edges are introduced, tree-space is the product 
K+ X Or', however, we will continue to ignore pendant edges for the remainder 
of the paper and take Tn =[JrOr- We let 7)^^^ C T/v be the set of trees containing 
at most N' — k internal edges for A: = 1,..., A^'. Equivalently, 7)^*^ is the union of 
the codimension-fc faces of the orthants. 

It is helpful to consider the structure of tree-space for the cases A^ = 4 and N = 5. 
When A^ = 4 every tree has just one internal edge, and there are three possible fully 
resolved tree topologies, as specified in the previous paragraph. Thus 7) consists 
of three copies of M+ glued together at the origin, and we write Ti = V^M+. Each 
‘arm’ of 71 corresponds to a different topology, and the position along the arm 
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specifies the length of the internal edge. The origin corresponds to the tree with 
no internal edges, called a star tree. For N = 5 there are 15 different fully resolved 
topologies, each containing 2 internal edges. In this case, T 5 consists of 15 copies 
of glued in sets of 3 along their edges. The origin of each orthant corresponds 
to the star tree. This article contains several figures showing a few orthants of T 5 
laid side-by-side in the plane: see Figure 1 for example. 

Billera, Holmes and Vogtmann proved the existence of a unique geodesic between 
any two points in Tn consisting of straight line segments in each orthant, and such 
that the length of the geodesic is given by the sum of the L^-lengths of the segments. 
We will refer to the corresponding metric as the geodesic distance on tree-space. 
Within each orthant it is the same as the Euclidean (L^) metric. However, beyond 
this, the structure and properties of tree-space geodesics are not directly relevant 
to our application. Tree-space Tn is equipped with the Borel sigma algebra defined 
using the geodesic metric. The restriction of these Borel sets to a single orthant 
Or are exactly the Borel subsets of . 

Although we ignore pendant edge weights for the remainder of this article, they 
are very readily dealt with. For each tree x G Tn, the pendant edge weights cor¬ 
respond to a point in If we consider Brownian motion on this space, with 

reflection at the boundaries, then at time to the distribution of particles starting 
from xq is given by a wrapped multivariate normal density with variance-covariance 
matrix to x I- Reflected random walks on converge to Brownian motion. Exis¬ 
tence of Brownian motion and convergence of random walks on the full tree-space 
R+ X U,. Or therefore follow immediately if we can establish these properties for 
the internal edges. 

2.2. Brownian motion in Euclidean space. Before considering Brownian mo¬ 
tion in tree-space, we start by establishing a result in Euclidean space. Lemma 2.1. 
This result characterizes the intersection of Brownian sample paths with the hy¬ 
perplanes in which correspond to one or more coordinates vanishing. 

Let X = , fix cco € AT and fix a time to > 0. Let Cxo[ 0 ,to]{X) denote the 

metric space of maps 77 : [ 0 ,to] T continuous with respect to the Euclidean 
metric d2{-, ■) on X which satisfy 77(0) = Xq. The metric between paths 771,772 G 
Cxo[ 0 ,to](T) is defined by 

doo(m,V 2 )= sup d 2 ( 771 ( 1 ), 772 ( 1 )). 
te[o.to] 

Then Brownian motion on X for particles starting from Xq and of duration Ig defines 
a distribution Bxo,to{X) on ( 73 , 0 [0,lo](Ar) equipped with the Borel sigma algebra. 

Define 

Hi = {(xi,... ,xn') G X : Xi = 0} for each i = 1,..., N' 

and 

= and = (J (n, nHj). 

i i^j 

We assume that xg G \ X^^\ so that at most one coordinate of xg is zero. Any 
path 77 drawn from misses the origin 0 S AT almost surely for all 1 > 0. 

In fact, because sample paths for Brownian motion on miss the origin almost 
surely, it follows that any path 77 drawn from Bx„^to{X) will almost surely have no 
intersection with AT^^^ and so there is almost surely a well defined, though possibly 
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infinite, set of intersections of rj with \ ordered according to increasing 

time. Each such intersection is with a single hyperplane, and the corresponding 
time-ordered set of hyperplane indices is denoted Tr{r]). If r]{t) G 11^ for some i and 
some t G (0, to), then it may be the case that there is an infinite set S) C {t — €,t + e) 
for some e > 0 such that r]{s) G Hi for all s G S) and such that 77 ( 5 ) n 11^ = 0 for all 
s G (t — e,t + e) and j yf i. In other words, given that ij hits a hyperplane 11^ there 
might be an infinite number of other hits with the same hyperplane before the path 
r] hits a different hyperplane. Given rj we define the pairwise distinct sequence 71 ( 77 ) 
to be the subsequence of 71 ( 77 ) obtained by ignoring any such repeated hits with the 
same hyperplane. We aim to prove that 77 ( 77 ) is almost surely finite. 

It is useful to set up some notation at this stage. For any sequence ii,i 2 , ■ ■ ■ ,ik G 
{!,...,TV'} define by 

Aui 2 ,--,ik = {v ^ Cxo[0,to]{X) : 71 ( 77 ) is well defined 

with 71 ( 77 ) = (ii, 72 , ■ • ■, ik) and 77 (^ 0 ) € X \ 

This definition includes for fc = 0 the set A* of paths which do not meet any 
hyperplane. These sets are Borel subsets of Cx^ [0, to](-’i’), as shown via the following 
inductive argument. First, A* is open, since it contains an e-neighbourhood of 
any of its elements. If we consider a path rj G Ai for some sequence of indices 
i = (ii,..., i}.), then for sufficiently small e any path 77 ' in the e-neighbourhood of 
77 either has 77 ( 77 ') = * or, in the case that 77 touches rather than crosses at least 

one hyperplane, 77 ( 77 ') can be a subsequence of i. Hence Ai U is an open 

set, where j < i means the sequence j is a subsequence of i with j ^ i. By the 
inductive hypothesis, each set Aj with j of length at most fc — 1 is a Borel set, and 
so Ai is Borel. We then define 

U 

1 • • • Ak 

and define A°° to be paths in the complement of (J^, which avoid X^‘^'> and which 
have 77 (^ 0 ) G X \ The sets A^ and A°° are also Borel sets, and A°° consists 

of paths rj for which 77 ( 77 ) is infinite. Finally, we define 

Bp = {rjG C',ro[ 0 ,to](^) : min inf {rjf{t) + 7j‘^(t)} > p} 
where rji{t), i = 1,..., N', denote the coordinates of rj. 

Lemma 2.1. The set A°° has measure zero with respect to the distribution Bx^ m{X). 
Equivalently, if rj is drawn from Bxg^toiX) then 77 ( 77 ) *5 al'most surely well-defined 
and finite. 

Proof. We aim to show that n Bp has measure zero for any fixed p > 0. It is 
simplest to first consider the case X = . The essential idea is that for a path 77 to 

hit Hi and II 2 infinitely many times, it must approach the origin arbitrarily closely. 
Suppose 77 is drawn from Bx^^tg^X) and assume that rj G A°° n Bp. Then 77 ( 77 ) 
is infinite, and so there is an increasing sequence of times Ti,T 2 ,... G [0, to] such 
that rji{Tr) = 0 or rj 2 {Tr) = 0 for each r. The index of the vanishing coordinate 
will alternate, so for all e > 0 we can find r such that T^+i — Tr < e, | 77 i(rr)| > p 
and rji{Tr+i) = 0. However, rji is a Brownian motion in M. If we consider a 
standard Brownian motion Z(t) on M with Z(0) = p, then by using the reflection 
principle, the event that Z(t) = 0 for t < e has probability 2 $(—p/e) where $ is 
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the cumulative distribution function of the standard normal distribution. It follows 
that 

Pr (77 e n Bp) < 2 $(-p/e) 
for all e, and so the probability is zero. 

For X = , again assume 77 is drawn from and that 77 G A°° H Bp 

for some fixed p > 0. For k = 1,... ,N' let A^f. be the set of paths 77 in A°° n Bp 
such that index k occurs infinitely often in 77 ( 77 ). Then 

(2.1) A^nBp= IJ A-,. 

An inductive argument, like that used above to prove Ai is Borel, shows that the 
subset of A°° n Bp consisting of paths 77 for which 77 ( 77 ) contains finitely many k 
is a Borel set. It follows that is also a Borel set. The proof in the previous 
paragraph for similarly shows that has measure zero and hence also A°°nBp 
by using equation (2.1). Finally, by taking the limit as p —?► 0 it follows that A°° 
has zero measure. □ □ 

It is also useful to consider the reflected Brownian motion on the positive orthant 
A_|_ = {(a;i,... ,xn') : Xi > 0 for all i}. The positive orthant is equipped with the 
standard Borel sigma algebra. The map 

(xi, . . . ,XAr/) i-A (|xi|,..., Ixtv'I) 
defined on X extends to give the reflection map 

: C,,jO,to](A) ^ ajO,to](A+) 

which is continuous. This then yields a well-defined probability measure B^^ ,toi^+) 
on C'a;o[0,to](A+) defined by 

( 2 . 2 ) B,,,to{X+){A) = B,„,t„(X)(7^-lA) 

for any measureable set A. Lemma 2.1 also applies to A_|_: for any sample path 
T] of the reflected Brownian motion the sequence 77 ( 77 ) is almost surely well-defined 
and finite. 


3. Brownian motion in tree-space 

In this section we give an explicit construction of Brownian motion on tree-space. 
The construction relies upon a projection map which takes paths in tree-space to 
paths in the positive orthant A+, and this projection is studied in Section 3.2. We 
show that the Markov process we construct behaves as we would intuitively expect: 
on the interior of orthants it is simply the Brownian motion in Euclidean space and 
at codimension-1 boundaries between orthants a diffusing particle moves with equal 
probability into each of the 3 neighbouring orthants relatived by nearest neighbour 
interchange. First, however, we briefly consider Brownian motion on 74 . 

3.1. Special case: 74 . Explicit solutions to the heat equation on 74 = V^K-|- have 
been constructed previously [12] and we briefly describe those solutions here. Let 
{x,k) denote a point in V^K-|- where x G K+ and k G {1,2,3} indexes the three 
‘arms’. If Brownian motion starts from (xq, 1), then at time to the density function 
is 

2 

/(x, k- Xq, t) = -(j){x; -Xo,t) + Ski ('/'(a:; xq, t) - cj){x; -Xq, t)) 


(3.1) 
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Figure 1. Definition of the map V for Ts- Left: three of the 
15 orthants of Ts are shown together with a path starting at xq. 
Right: projection of the path into X_|_ = Each section of the 
path in is a reflection of the corresponding section within a 
single orthant in 75 . 


where denotes the density of the normal distribution on K with mean 

/I and variance cr^. This is the same as a certain transition density used in [3] 
to prove existence of Brownian motion on 2-dimensional Euclidean complexes. By 
construction, Brownian motion on each arm is locally the same as Brownian motion 
on R; and given a particle hits the origin in some time interval (ti,t 2 ) then it is 
equally likely to be on each of the three arms at time t 2 - If we project V3R+ ^ R+ 
in the obvious way, the density function obtained is a wrapped normal distribution, 
corresponding to Brownian motion on R_|_. 

3.2. Projecting tree-space paths to Euclidean paths. Our construction of 
Brownian motion on tree-space depends on a projection map on paths in Tjv to 
paths in which is defined whenever the initial point xq is fully resolved. Paths 
on Tjv in the inverse image under projection of a path on X_f_ differ from one another 
in the ‘choice’ of orthant taken when traversing 7^^^ The projection enables us 
to write down a probability measure on sets of paths in tree-space in terms of the 
Brownian motion on X ^, and we will take this as our definition of Brownian motion 
on tree-space. 

In analogy to the notation in Section 2.2, we let [0, to](TY) be the metric 
space of continuous maps ij : [0,to] Tjv which satisfy 17 ( 0 ) = xq. The metric is 
defined by 

dooim^m) = sup d{r]i{t),r] 2 {t)) 
te[o,to] 

for any two 171,172 € Cxo[0,to]{TN) where d{-,-) denotes the geodesic distance. Fix 
a fully resolved tree Xq G Tn with topology r. Then, by fixing a particular ordering 
of the internal edges of the tree, we obtain a corresponding point in the interior of 
7 l_|_. By a slight abuse of notion, we also denote this point by Xq. We then define 
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the projection map V on paths in the following way. Define C by 

C = {t]& C'a;o[0,to]('7Ar) : im(77) n 7]^^^ = 0 and 77 (^ 0 ) & Tn\ 

For any path r] £ C we define a corresponding path V{r]) G [0, to](7f+) in the 
following way. Under the correspondence between the orthant Or containing xq and 
X_|_, we define V{r]) in the natural way until 77 first hits a codimension-1 face. At 
this point a certain edge e has zero length and the corresponding coordinate of V{r]) 
is zero. Then V{r]) is continued back into the interior of A+, maintaining all the 
coordinate values corresponding to edges other than e. The remaining coordinate is 
given by the length of the edge e* which replaced e on the path rj (so that e* is either 
e itself or one of the two edges obtained by NNI of e in r). This process continues 
for all t G [0, to] to define V{r]) and gives a continuous map V : C ^ Cxg [0, to](A+). 
Figure 1 illustrates how V is defined. 

Next for any sequence U, 72 ; • ■ • Ufe define by 


= {v & C'a:o[0;to](A_|-) : 71 ( 77 ) is well defined 

with 71 ( 77 ) = ( 7 i, 72 , and ri{to) G int(X+)}. 


This definition includes for fc = 0 the set C* of paths which do not meet any face. 
The results in Section 2.2 show that paths sampled from Bxg^to{^+) are contained 
in the union of the almost surely, so 

(3.2) E E = 1 . 

k>0 


The second summation is over all sequences U,..., 7 ^ of integers in {1,..., such 
that no consecutive pair of terms in the sequence are equal. We need to consider 
the pre-images 


a. 




which are all open subsets of C. Then, suppose for each codimension-1 face in 
Tn we fix an arbitrary numbering 1,2,3 of the adjacent orthants. Any path 77 in 
determines a sequence ji,j 2 , ■ ■ ■ ,jk G {1; 2, 3} such that jV is the number 
of the final orthant reached before face ir+i is hit for r — 1,... fc — 1 and jk is 
the number of the orthant containing r]{to). For compactness of notation, we let 
i = {ii,... ,ik) and j = {ji, ■ ■ ■, jk) represent sequences of indices. 

We then let Cf be the subset of Ci of paths which share the same sequence 
j = (ji; J 2 , • ■ •, jfc) and note that 


C7 = u 

JG{1,2,3}'‘ 


where the union is disjoint. It follows from the definition of V that 

n&i) = Ci 


for all choices of j. 

We will be particularly concerned with the set of Brownian sample paths which 
end in some particular region of tree-space. Let U be any subset of Tn- For any fc 
and sequences i = (U, ■ ■ ■ ,ik) and j = (ji,..., j^), let Cf{U) be the subset of 
consisting of paths 77 with r]{to) G U. This set will be empty unless the sequences 
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i,j determine a sequence of topologies ending in an orthant which intersects U. 
Similarly, given i = (ii, ..., let 

Ci{u) = U diiu). 

je{i,2.3}'= 

Lemma 3.1. I/UcTn is a Borel set, then C^{U) and V{Cj{U)) are also Borel 
sets in Cxo[0,to]{TN) and Cxo[0,to]{X^) respectively. It follows that Cf and Ci are 
also Borel sets. 

Proof. The inductive argument in Section 2.2 which showed that each set Ai is 
Borel also shows that the sets Ci C are Borel sets. If follows that for each 
i, Ci = 'P~^{Ci) is also a Borel set, since V is continuous. The proof that each 
set C^ is a Borel set uses the same kind of inductive argument. Fix some valid 
index vectors i and j of length k. If we pick some element rj G C^ and consider a 
small perturbation rj' of rj, then often 71 ( 77 ') = '^iv) 77 ' will have the same index 

sequence j. In fact, for sufficiently small perturbations, 71 ( 77 ') can only differ from 

/o\ 

71(77) when rj touches Tf; . It follows that the set 

cfu u Ci; 

i'.j' 

is open, where the union is taken over all subsequences i' < i and j' of length 
fc — 1 or less such that the pair {i' ,j) determines the same orthant as {i,j). By 
the inductive hypothesis, the sets are Borel sets, and so we conclude Cl is also 
a Borel set. Since the set of paths ending in some specified region t/ is a Borel 
set whenever U C T/v is Borel, it follows that Cl{U) is a Borel set. Finally, since 
V is injective when restricted to Cl, the image V{Cl{U)) is a Borel set, using [7], 
Theorem 15.1. □ □ 

To complete this section on the projection map V, we prove the following tech¬ 
nical lemma. 

Lemma 3.2. For any subset U of Tn and any sequences i = {ii,...,ik) and 
j = {jl, ■ ■ ■ Jk) 

( 3 . 3 ) cinv-^p(di{U))=ci{U) 

Proof. It is clear that V~^V{Ci{U)) contains Ci{U), and V~^V{Ci{U)) C Ci 
since V{Cl{U)) C Ci. It remains to show the left hand side of ( 3 . 3 ) contains no 
elements outside Cl (U). So suppose 77 S Cl{U) and 77' G Cl satisfy 'P{'q) = Vir]'), 
so that 77' is a general element of the left hand side of ( 3 . 3 ). Since both paths 
are contained in Cl, it follows that 77 (to) and rf [ttf) lie in the same orthant and 
also enter that orthant from the same codimension-1 face of tree-space. Then 
P{rf) = V{r]') implies that 77 and 77 ' are identical in the orthant containing time 
to, and in particular r]{to) = T]'{to) so 77' G Cl{U). This establishes equation ( 3 . 3 ). 
Note that it is not necessarily the case that 77 = 77': the paths can pass through 
different orthants whenever P{rj) hits the same face of multiple consecutive 
times - see Figure 2. □ □ 
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Figure 2. Example of two paths 771,172 in Ts with V{rii) = 7 ^( 772 ). 

Both paths lie in the same set (7| with k = 2 but traverse a dif¬ 
ferent set of topologies. Left: four of the 15 orthants of Ta- Right: 
projection of the paths into = 

3.3. Brownian motion on tree-space. 

Definition 3.3. Define the probability measure .to(TN) on C^g[0,to]{TN) with 
the Borel sigma algebra by 

00 

(3.4) = Y. E B,,,to{X+){V{And{)) 

k=0 i=(ii,....ifc) jG{1,2.3}'“ 

for any Borel set A. 

Lemma 3.4. The probability measure in equation (3.4) is well-defined. 

Proof. If >1 is a Borel set, then so is A n for any i,j. But V is injective when 
restricted to Cf, and so the image V{Ar\ Cf) is a Borel set, using [7], Theorem 
15.1. ft follows that each term in the sum is well-defined. Next, the total measure 
is given by 

00 

b.om(Tn){ u c{) = YT.^~''T.^-o.to{x+){v{di)) 

k—Q i j 

00 

k—O i j 

00 

= YY.^-o,toix+m) 

k—O i 

= 1 

using equation (3.2), since there are 3^ possibilities for the vector of indices j. 
Finally, suppose Ai,A2,... C [0,to](T/v) are disjoint Borel sets. We aim to 
show countable additivity. Without loss of generality we can assume each set Ai is 
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contained within a single set of the form Cf, since if this is not the case, we can 
decompose the sets Ai into a larger collection of disjoint sets. We also assume at 
most one set Ai is contained in each set Cf. Let ki denote the length of the index 
vector associated with the set containing Ai. Then 

OO 

B.„,to{rN){\jAi) = n\jAi)) 

k—0 i j 

I 

= y^^xo,to(7]v)(^/). 

I 

If there is more than one set contained in each the result still holds, since, for ex¬ 
ample, HA, Be Cl are disjoint then 5,^0.to = Bj;„^ta{X+){V{A)) + 

B,,,t,{X+){V{B)). □ □ 

The measure on paths determined by equation (3.4) corresponds to the ‘physi¬ 
cal’ diffusion of particles on tree-space: within each orthant particles diffuse as in 
Euclidean space, and when a particle hits a codimension-1 boundary it moves with 
equal probability into each neighbouring orthant. These properties can be proved 
formally, but they essentially follow from the symmetry of the probability measure 
Bxo.toiTN) under permutation of the indices j. For example, consider how particles 
behave near orthant boundaries. If ^ C Ca,,, [0, to](Tiv) is a set of paths such that 
each path crosses a particular codimension-1 boundary into a certain orthant Ox, 
then the same probability is given to a set A' where each path r/' € A! is identical to 
a path in A in the sense that V{t]) = V{t]') but the paths in A! cross into an orthant 
Ox’ where t' is related to r by nearest neighbour interchange in the codimension-1 
boundary. It follows that for 0 < t < to the position Z(t) G Tjv of a particle along 
a path sampled from is a well-defined Markov process. 

Brin and Kifer [3] proved existence of Brownian motion on certain 2-dimensional 
Euclidean complexes. Their proof, in contrast to Lemma 3.4, constructs a Markov 
transition kernel starting from the definition of the ‘physical’ diffusion process 
above. Their proof is similar to the proof of Lemma 2.1 above, in that it shows there 
are finitely many intersections between any Brownian sample path and successive 
codimension-1 regions, although it does not consider the possibility that sample 
paths could wind with increasing speed around codimension-2 points while still 
being bounded away from them. We have adopted a different approach by employ¬ 
ing the projection map V to construct Brownian motion as an explicit probability 
measure on sample paths, before interpreting this in terms of a Markov process on 
Ttv- This approach is more useful in order to prove convergence of random walks, 
as it enables us to base this proof on convergence in Euclidean space. 

4. Random walks in tree-space and convergence to Brownian motion 

We aim to prove that certain random walks on Bn converge to Brownian motion 
on Bn in an appropriate sense. We do this via random walks on Euclidean space 
X, which we define in the following way. Fix a number of steps m and consider the 
following algorithm: 
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Algorithm 4.1. Start with yo = xq G X. For j = 1,2,..., r where r = m x N', 
repeat the following: 

(1) Pick k uniformly at random from {1,..., N'}. 

(2) Sample a realization Sj from N{0,to/m) and let i* = bj +yj,k where yj^k is 
the k-th coordinate ofyj. 

(3) Set yj+i := yj but then change the k-th coordinate to yj+i^k = ■ 

Other versions of this algorithm could also be used. For example, rather than 
sampling which edge to change at step 1, a fixed permutation of edges can be used, 
with each edge incremented exactly m times at step 2. This algorithm also produces 
paths which converge to Brownian motion on X. 

By linearly interpolating between the collection of points 2/i) • ■ • > 2 /t- produced 
by the algorithm, so that each segment has time duration t^/r, we obtain an el¬ 
ement of We denote the distribution on Ca;o[0,to](^) induced by 

this procedure as ^^{X). Standard theory of Euclidean random walks shows 
that ^ B,,. ,to{X) as m —>■ oo where w denotes weak convergence of the 

probability measures [6] . We obtain the random walk on X^ by mapping paths on 
X to A+ via the reflection map. The corresponding distribution on [0, to](^-i-) 
is denoted W™ tQ(A+) and is defined by 

WZto^X+){A) = 

for any Borel set A. Random walks on A+ converge to Brownian motion: 

(4.1) asm^oo. 

This is a straightforward consequence of the convergence of the processes on X. 

Next we define the random walk on Bn via the following algorithm. 

Algorithm 4.2. Start with a fully resolved tree yo = Xq G Bn- We maintain a list 
L throughout, with L initially empty. For j = 1, 2,..., r where r = m x N' repeat 
the following: 

(1) Pick a split an internal split e from yj-i uniformly at random. 

(2) Sample a realization Sj from N{0,to/m) and let I — 6j (e) where 

is length of e in yj-i. 

(3) Set yj := yj-i but then change a single edge length as follows: 

(a) //r > 0 set£y^{e) := t. 

(b) Otherwise let e* be a split selected uniformly at random from the set 
{e, e',e"} where e',e" are splits associated with the two edges obtained 
by performing NNI of e in yj-i. Replace e with e* in yj and set 
ly. (e*) := —£*. If e* = e, let L := LU {j}. 

Modifications to the algorithm, which deal with the case that xq is unresolved, 
are discussed in section 5. As for Algorithm 4.1, there are alternative ways to define 
random walk on tree-space. For example, a fixed order of edges could be used at step 
1 of Algorithm 4.2 rather than randomly sampling edges; or a uniform distribution 
could be used to increment edge lengths at step 2. It is also possible to change 
all the edge lengths simultaneously by using a multivariate normal distribution, 
although the algorithm is more complicated and so we do not give the details here. 

We can use Algorithm 4.2 to simulate paths rj G C'a;„ [0, to](7)v) by linear inter¬ 
polation via geodesic segments between the points yo = Xq, j/i,..., j/r generated 
by the algorithm. This process is illustrated in Figure 3. Each geodesic segment 
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Figure 3. A realization of a random walk lying on three orthants 
in Ts with each orthant shown as a shaded square. At each step 
of the random walk a single edge changes length or is replaced 
with an alternative. The path between points a, b, c corresponds 
to a single step of the algorithm: from point a, the random walk 
reflected back from point b to give point c in the original orthant. 
In terms of Algorithm 4.2, this corresponds to a step for which £* 
was negative and for which e* = e. 


involves a single edge in the tree changing length and possibly being replaced with 
an NNI alternative. Each geodesic consists of either a straight line within a single 
orthant, or a straight line consisting of two segments in neighbouring orthants, as 
Figure 3 shows. The list L records those steps of the algorithm when the simulated 
random walk meets a face of an orthant and moves into the same orthant rather 
than either of the two neighbours. In the case that j € L, the points Uj-i and yj 
are not joined directly by the geodesic but by a path which touches 

the orthant boundary. Specifically, if e is the edge whose length differs between 
yj-i and yj, then the path consists of contracting e down to zero length from yj-i 
and then expanding the edge out to hit yj. In Figure 3 such a step occurs between 
points a, b, c. 

Lemma 4.3. If y is a path generated via Algorithm ^.2 then V{r]) has the distri¬ 
bution 

Proof. The vector of edge lengths defined by the algorithm is a random walk on on 
Xj., generated in exactly the same way as performing Algorithm 4.1 on X followed 
by the action of the reflection map TZ. D D 

Let denote the distribution onCxo [0,to](T/v) induced by Algorithm 4.2. 

The following lemma, which relates random walk on Tat to random walk on Xj. 
can be thought of as a tree-space analogue of the reflection principle. 
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Lemma 4.4. If U G Tn is a Borel set then for any i = {ii,... ,ik) and j = 
ijli •' • 1 jk) 


(4.2) 


wztoiTNKcim = 


Proof. Let A C C* be a Borel set for some i = {ii,... ,ik). Then consider the set of 
paths 'P~^{A). This set of paths is fully symmetric, in the sense that if it contains 
some path rj which hits a codimension -1 face in 7)v at time t then it also contains 
the paths 77 ', rj" which are identical to rj up to time t but then move into the other 
two possible orthants at time t and satisfy Vfq) = V{r]') = V{ri''). It follows from 
the definition of random walk on tree-space that 

(4.3) w^^^,^iTN){r-\A)) = iy-*^(X+)(kl). 


If 77 is sampled from IT™ (Tv), and given that rj G Ci for some i = (ii,..., ik), 
then it is equally likely that 77 is contained in each possible set Cf where j G 
{1,2,3}*^ since the direction taken when crossing codimension-1 faces is selected 
uniformly at random. Since there are 3^ possibilities for j, it follows that 

wZtoirN){v-\A)ndi) = 3-'=iT™,„(x+)(ki). 


By substituting \n A = V{Cf{U)) and applying Lemma 3.2, we obtain equa¬ 
tion (4.2). □ □ 


We will not in fact prove that IT™_(jj(7)v) ^ B^g^toiTw) as m —)■ 00 , but rather 
prove that 

wz,toirN){A) ^ B,„to{rN){A) 

for certain sets A consisting of all paths which end in given regions of 7)v at time to- 
This result is sufhcient to enable us to use Algorithm 4.2 to simulate approximately 
the distribution of particles in tree-space at time tg assuming they have undergone 
Brownian motion starting from xq - the main practical aim of this article. 

Definition 4.5. Let W{xo,to':'m) denote the distribution on Tn given by the end¬ 
points T/y. simulated by Algorithm 4.2, or equivalently given by sampling a path 77 
from IT™ (|j(7Ar) and taking r]{to). Similarly let B{xo,to) be the distribution on Tn 
induced in a similar way, but for which 77 is drawn from ,to('7w). Then 

W(To,to;m)(A) = IT™,„(riv)(C(A)) 

B{xoM){A) = B^,^to{rN){C{A)) 

for any measurable A C Tiv, where C{A) C C denotes the set of all paths which 
hit A at time to¬ 
ff P is a measure on a measurable metric space, then a P-continuity set P is a 
Borel set such that P{dU) = 0 where dU is the boundary of U. In order to prove 
weak convergence, it is sufficient to show that W{xo,to','kn){U) — B{xo,to){U) for 
any P(a;o, to)"Continuity set U. 

Lemma 4.6. If U G Tn is a B(xoAo)-continuity set, then 

(V Cl{U) is a B^„, .to{T n)- continuity set for any sequences i,j, and 
(2) 'P{Ci{U)) is a Bxo,to{X+)-continuity set. 
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Proof. Fix a i?(xo, <o)-continuity set U C Tn and sequences i,j of length k. Let 
Ot be the orthant corresponding to the end points of paths determined by the 
sequences i,j. The boundary of Cl{U) decomposes into two pieces: Cf{dU) and 
a set of paths which end at time to on 7^^^. Both these have zero measure with 
respect to i?a:o,to (Ttv) so: 

([/)) = B,,^t,{TN){ci{du)) = o 

and Cl (U) is a (77v)-continuity set. Given i,j, V is an isomorphism between 

Ci and Ci- It follows that dV{Ci (t/)) decomposes into two parts: V{Ci(dll)) and 
another set of paths which end on the boundary of and which therefore have 
zero measure with respect to Bxg^toi^+)- Thus if C/ is a i3(xo,to)-continuity set it 
follows that 

o = B,„,,„{rNKci{du)) 

= B^^^to{X+){'P{Cl{dU))) using Definition 3.3 

= i-^B^,^,,{X+){dV{Ci{U))) 

and so 'P{Cl(U)) is a continuity set with respect to ,toi^+)- n □ 

Theorem 4.7. 

(4.4) W (to, to; w) a B{xo, to) 
as m —> 00 . 

Proof. Fix any Borel set U C Tn, and fix fc, i = (ii, .. . ,ik) and j e {1,2,3}^. If 
t/ is a B3;g^tg(71v)-continuity set then equation (4.2) gives 

wz,t,iTN){ci{u)) = 3->‘wz,toix+)i'P{diiu))) 

(4.5) ^ 3-'=B,g,tg(X+)(lP(C^'(C/))) as m ^ 00 

Convergence in equation (4.5) occurs because random walk on X^ converges weakly 
to Brownian motion, and by Lemma 4.6 V{Cl (C/)) is a continuity set with respect 
to (7f_|_). 

Now for any continuity set U 

OO 

W{xo, to; ™)(C/) = E E E iTN){Ci{U)). 

k—0 i j 

Since this is bounded above by 1, for any e > 0 there exists K such that: 

k>K i j 

and 

\T.T.T.B^o,toiTN)idim\ < I 

k>K i j 

Then, by taking m sufficiently large 

IE T.T.{^Zto('T'N)ici{u))-B,^,,zTN){ci{u))) I < I 

k<K i j 
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and so \W{xo,to-,m){U) — B{xo,to){U)\ < e. This proves the weak convergence in 
Equation (4.4). □ □ 

5. Unresolved source xq 

In Theorem 4.7 it was assumed that xq was fully resolved. In this section we 
consider existence of Brownian motion and convergence of random walks when this 
assumption is dropped. Definition 3.3 and Theorem 4.7 both rely on the projection 
map V being well-defined and continuous. In fact, when xq contains — I internal 
edges, the definition of V in Section 3.2 does not require any modification and 
so Definition 3.3 and Lemma 3.4 are still valid when xq G Ti^'^ ■ The proof of 
Theorem 4.7 still holds in his situation provided we modify Algorithm 4.2 to deal 
with the case that Xq is unresolved. This is achieved by redefining j/q (the starting 
point for the random walk) in the following way. A fully resolved topology which 
resolves xq is chosen uniformly at random. Then, as a set of weighted splits, yo is 
taken to have this topology. We set iyo{e) = £xo(s) for all splits e in xg but set 
iyo{e) = 0 for the remaining splits in the fully resolved topology. Algorithm 4.2 
then proceeds in the same manner, sampling yi,... in turn. 

More generally, xg could lie in T2^\ but in this case there is no canonical 
definition of 7^. It is important to note that, due to non-trivial holonomy in tree- 
space, there is no continuous map 7)v Ai_|_ which acts as an isomorphism Or = 

A_|_ on each orthant when > 4. (The map V defined in Section 3.2 operates 

specifically on paths in tree-space, and so the existence of V when Xg is fully resolved 

( 2 ) 

does not contradict the previous statement.) A proof of convergence when XgGT^ 
might consider a small (5-neighbourhood of Xg. The Brownian motion will exit this 
neighbourhood at a fully-resolved tree uniformly at random on the boundary of the 
neighbourhood, and the proof of Theorem 4.7 would then apply from this point 
onwards. Clearly, the extension to the general case of unresolved xg requires more 
technical analysis than developed here, and we leave this as an open problem. In 
the special case that xg is the star-tree the Brownian motion is tractable: at time 
t the density on each orthant is proportional to a multivariate normal density at 
the origin and with variance-covariance matrix t x I. It is evident that the random 
walk from xg converges to this distribution. 

6. Concluding remarks 

We have introduced Brownian motion on tree-space as a means of constructing 
a family of distributions B{xg,tg). Our aim is to use these distributions to build 
more sophisticated probability models on tree-space, and perform inference for these 
models. The simplest model is to consider a set of data points xi,... ,Xn G Tn as 
being independent draws from B(xg,tg). Inference algorithms for the parameters 
xg,tg under this model are being developed and will be the subject of future pub¬ 
lications. However, these are based on the use of random walks to approximate 
Brownian motion on tree-space, and so an important first step has been to es¬ 
tablish convergence, as demonstrated in this article. Furthermore, the Brownian 
motion studied in this article could be generalized in several different ways by anal¬ 
ogy with diffusion processes in Euclidean space. For example, it might be possible 
to construct a diffusion with a non-trivial covariance structure corresponding to a 
diffusion with ‘preferred directions’ in tree-space. Similarly, it seems possible to 
adapt Brownian motion in tree-space in order to define a mean reverting stochastic 
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process, an analogue of the Ornstein-Uhlenbeck process. Inference for models using 
these stochastic process is likely to rely on forward simulation, and will therefore 
build on the existence and convergence results established in this article. Instead 
of considering more general stochastic processes on tree-space, a second direction 
is to consider random walks and Brownian motion on a wider class of Euclidean 
complexes. The general approach adopted in this paper, using the projection map 
V to map paths down to Euclidean space and to symmetrize around codimension-1 
boundaries, could be adapted to prove convergence of random walks to Brownian 
motion on other Euclidean complexes or even manifold-stratified spaces. 
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